NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Fusing Reward and Dueling Feedback in Stochastic Bandits

Wang, Xuchuang; Zeng, Qirun; Zuo, Jinhang; Liu, Xutong; Hajiesmaili, Mohammad; Lui, John; Wierman, Adam (July 2025, ICML)

Free, publicly-accessible full text available July 25, 2026
Multi-Agent Stochastic Bandits Robust to Adversarial Corruptions

Ghaffari, Fatemeh; Wang, Xuchuang; Zuo, Jinhang; Hajiesmaili, Mohammad (June 2025, Annual Conference on Learning for Dynamics and Control (L4DC))

Free, publicly-accessible full text available June 5, 2026
Stochastic Bandits Robust to Adversarial Attacks

Wang, Xuchuang; Zuo, Jinhang; Liu, Xutong; Lui, John; Hajiesmaili, Mohammad (April 2025, ICLR)

Free, publicly-accessible full text available April 28, 2026
Heterogeneous Multi-Agent Bandits with Parsimonious Hints

https://doi.org/10.1609/aaai.v39i18.34143

Mirfakhar, Amirmahdi; Wang, Xuchuang; Zuo, Jinhang; Zick, Yair; Hajiesmaili, Mohammad (April 2025, Proceedings of the AAAI Conference on Artificial Intelligence)

We study a hinted heterogeneous multi-agent multi-armed bandits problem (HMA2B), where agents can query low-cost observations (hints) in addition to pulling arms. In this framework, each of the M agents has a unique reward distribution over K arms, and in T rounds, they can observe the reward of the arm they pull only if no other agent pulls that arm. The goal is to maximize the total utility by querying the minimal necessary hints without pulling arms, achieving time-independent regret. We study HMA2B in both centralized and decentralized setups. Our main centralized algorithm, GP-HCLA, which is an extension of HCLA, uses a central decision-maker for arm-pulling and hint queries, achieving O(M^4 K) regret with O(M K log T) adaptive hints. In decentralized setups, we propose two algorithms, HD-ETC and EBHD-ETC, that allow agents to choose actions independently through collision-based communication and query hints uniformly until stopping, yielding O(M^3 K^2) regret with O(M^3 K log T) hints, where the former requires knowledge of the minimum gap and the latter does not. Finally, we establish lower bounds to prove the optimality of our results and verify them through numerical simulations.
more » « less
Free, publicly-accessible full text available April 11, 2026
Combinatorial Logistic Bandits

https://doi.org/10.1145/3726854.3727279

Liu, Xutong; Dai, Xiangxiang; Wang, Xuchuang; Hajiesmaili, Mohammad; Lui, John CS (June 2025, ACM)

Free, publicly-accessible full text available June 9, 2026
Heterogeneous Multi-Agent Bandits with Parsimonious Hints

Mirfakhar, Amirmahdi; Wang, Xuchuang; Zuo, Jinhang; Zick, Yair; Hajiesmaili, Mohammad (February 2025, AAAi)

Free, publicly-accessible full text available February 10, 2026
Stochastic Bandits Robust to Adversarial Attacks

Wang, Xuchuang; Liu, Maoli; Zuo, Jinhang; Liu, Xutong; Lui, John; Hajiesmaili, Mohammad (January 2025, Proceedings of the Thirteenth International Conference on Learning Representations.)

Free, publicly-accessible full text available January 22, 2026
Asynchronous Multi-Agent Bandits: Fully Distributed vs . Leader-Coordinated Algorithms

https://doi.org/10.1145/3711696

Wang, Xuchuang; Chen, Yu-Zhen Janice; Liu, Xutong; Yang, Lin; Hajiesmaili, Mohammad; Towsley, Don; Lui, John CS (March 2025, Proceedings of the ACM on Measurement and Analysis of Computing Systems)

We study the cooperative asynchronous multi-agent multi-armed bandits problem, where each agent's active (arm pulling) decision rounds are asynchronous. That is, in each round, only a subset of agents is active to pull arms, and this subset is unknown and time-varying. We consider two models of multi-agent cooperation, fully distributed and leader-coordinated, and propose algorithms for both models that attain near-optimal regret and communications bounds, both of which are almost as good as their synchronous counterparts. The fully distributed algorithm relies on a novel communication policy consisting of accuracy adaptive and on-demand components, and successive arm elimination for decision-making. For leader-coordinated algorithms, a single leader explores arms and recommends them to other agents (followers) to exploit. As agents' active rounds are unknown, a competent leader must be chosen dynamically. We propose a variant of the Tsallis-INF algorithm with low switches to choose such a leader sequence. Lastly, we report numerical simulations of our new asynchronous algorithms with other known baselines.
more » « less
Free, publicly-accessible full text available March 6, 2026
Asynchronous Multi-Agent Bandits: Fully Distributed vs. Leader-Coordinated Algorithms

https://doi.org/10.1145/3726854.3727272

Wang, Xuchuang; Chen, Yu-Zhen Janice; Liu, Xutong; Yang, Lin; Hajiesmaili, Mohammad; Towsley, Don; Lui, John CS (June 2025, ACM)

Free, publicly-accessible full text available June 9, 2026
Best Arm Identification with Quantum Oracles

Wang, Xuchuang; Chen, Yu-Zhen; Guedes_de_Andrade, Matheus; Allcock, Jonathan; Hajiesmaili, Mohammad; Lui, John; Towsley, Don (February 2025, AAAI)

Free, publicly-accessible full text available February 10, 2026

« Prev Next »

Search for: All records